# Joins between cubes

Joins create relationships between cubes in the [data model][ref-data-model].

They allow to build complex [queries][ref-queries] that involve members from multiple
cubes. They also allow to [reference][ref-references] members from other cubes in
[calculated members][ref-calculated-members], [views][ref-views], and
[pre-aggregations][ref-preaggs].

When defining joins, it's important to understand [join types](#join-types) and the
[direction of joins](#direction-of-joins) as well as how [join paths](#join-paths) and
[join hints](#join-hints) are used to work with the joined cubes.

## Join types

Cube supports three [types of join relationships][ref-schema-ref-joins-relationship]
often found in SQL databases: `one_to_one`, `one_to_many`, and `many_to_one`.

For example, let's take two cubes, `customers` and `orders`:

<CodeTabs>

```javascript
cube(`customers`, {
  // ...

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    company: {
      sql: `company`,
      type: `string`
    }
  }
})

cube(`orders`, {
  // ...

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    customer_id: {
      sql: `customer_id`,
      type: `number`
    }
  }
})
```

```yaml
cubes:
  - name: customers
    # ...

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: company
        sql: company
        type: string

  - name: orders
    # ...

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: customer_id
        sql: customer_id
        type: number
```

</CodeTabs>

We could add a join to the `customers` cube:

<CodeTabs>

```javascript
cube(`customers`, {
  // ...

  joins: {
    orders: {
      relationship: `one_to_many`,
      sql: `${CUBE}.id = ${orders.customer_id}`
    }
  }
})
```

```yaml
cubes:
  - name: customers
    # ...

    joins:
      - name: orders
        relationship: one_to_many
        sql: "{CUBE}.id = {orders.customer_id}"
```

</CodeTabs>

The join above means a customer has many orders. Let's send the following JSON
query:

```json
{
  "dimensions": ["orders.status", "customers.company"],
  "measures": ["orders.count"],
  "timeDimensions": [
    {
      "dimension": "orders.created_at"
    }
  ],
  "order": { "customers.company": "asc" }
}
```

The query above will generate the following SQL:

```sql
SELECT
  "orders".status "orders__status",
  "customers".company "customers__company",
  count("orders".id) "orders__count"
FROM
  public.customers AS "customers"
  LEFT JOIN public.orders AS "orders"
    ON "customers".id = "orders".customer_id
GROUP BY 1, 2
ORDER BY 2 ASC
LIMIT 10000
```

However, if we have guest checkouts, that would mean we would have orders with
no matching customer. Looking back at the `one_to_many` relationship and its'
resulting SQL, any guest checkouts would be excluded from the results. To remedy
this, we'll remove the join from the `customers` cube and instead define a join
with a `many_to_one` relationship on the `orders` cube:

<CodeTabs>

```javascript
cube(`orders`, {
  // ...

  joins: {
    customers: {
      relationship: `many_to_one`,
      sql: `${CUBE}.customer_id = ${customers.id}`
    }
  }
})
```

```yaml
cubes:
  - name: orders
    # ...

    joins:
      - name: customers
        relationship: many_to_one
        sql: "{CUBE}.customer_id = {customers.id}"
```

</CodeTabs>

In the above data model, our `orders` cube defines the relationship between
itself and the `customer` cube. The same JSON query now results in the following
SQL query:

```sql
SELECT
  "orders".status "orders__status",
  "customers".company "customers__company",
  count("orders".id) "orders__count"
FROM
  public.orders AS "orders"
  LEFT JOIN public.customers AS "customers"
    ON "orders".customer_id = "customers".id
GROUP BY 1, 2
ORDER BY 2 ASC
LIMIT 10000
```

As we can see, the base table in the query is `orders`, and `customers` is in
the `LEFT JOIN` clause; this means any orders without a customer will also be
retrieved.

### Many-to-many joins

A many-to-many relationship occurs when multiple records in a cube are
associated with multiple records in another cube.

For example, let's say we have two cubes, `topics` and `posts`, pointing to the
`topics` and `posts` tables in our database, respectively. A `post` can have
more than one `topic`, and a `topic` may have more than one `post`.

In a database, you would most likely have an associative table (also known as a
junction table or cross-reference table). In our example, this table name might
be `post_topics`.

The diagram below shows the tables `posts`, `topics`, `post_topics`, and their
relationships.

<Diagram
  alt="Many-to-Many Entity Diagram for posts, topics and post_topics"
  src="https://ucarecdn.com/61343995-dedc-40ae-9367-e21a645051ee/"
/>

In the same way the `post_topics` table was specifically created to handle this
association in the database, we need to create an associative cube
`post_topics`, and declare the relationships from it to `topics` cube and from
`posts` to `post_topics`.

<CodeTabs>

```javascript
cube(`posts`, {
  sql_table: `posts`,

  joins: {
    post_topics: {
      relationship: `one_to_many`,
      sql: `${CUBE}.id = ${post_topics.post_id}`
    }
  }
})

cube(`topics`, {
  sql_table: `topics`,

  dimensions: {
    post_id: {
      sql: `id`,
      type: `string`,
      primary_key: true
    }
  }
})

cube(`post_topics`, {
  sql_table: `post_topics`,

  joins: {
    topic: {
      relationship: `many_to_one`,
      sql: `${CUBE}.topic_id = ${topics.id}`
    }
  },

  dimensions: {
    post_id: {
      sql: `post_id`,
      type: `string`
    }
  }
})
```

```yaml
cubes:
  - name: posts
    sql_table: posts

    joins:
      - name: post_topics
        relationship: one_to_many
        sql: "{CUBE}.id = {post_topics.post_id}"

  - name: topics
    sql_table: topics

    dimensions:
      - name: post_id
        sql: id
        type: string
        primary_key: true

  - name: post_topics
    sql_table: post_topics

    joins:
      - name: topic
        relationship: many_to_one
        sql: "{CUBE}.topic_id = {topics.id}"

    dimensions:
      - name: post_id
        sql: post_id
        type: string
```

</CodeTabs>

<InfoBox>

The following example uses the `many_to_one` relationship on the `post_topics`
cube; this causes the direction of joins to be `posts → post_topics → topics`.
Read more about the [direction of joins](#direction-of-joins).

</InfoBox>

In scenarios where a table doesn't define a primary key, one can be generated
using SQL:

<CodeTabs>

```javascript
cube(`post_topics`, {
  // ...

  dimensions: {
    id: {
      sql: `CONCAT(${CUBE}.post_id, ${CUBE}.topic_id)`,
      type: `number`,
      primary_key: true
    }
  }
})
```

```yaml
cubes:
  - name: post_topics
    # ...

    dimensions:
      - name: id
        sql: "CONCAT({CUBE}.post_id, {CUBE}.topic_id)"
        type: number
        primary_key: true
```

</CodeTabs>

**Virtual associative cubes.**
Sometimes there is no associative table in the database, when in reality, there
is a many-to-many relationship. In this case, the solution is to extract some
data from existing tables and create a virtual (not backed by a real table in
the database) associative cube.

Let’s consider the following example. We have tables `emails` and
`transactions`. The goal is to calculate the amount of transactions per
campaign. Both `emails` and `transactions` have a `campaign_id` column. We don’t
have a campaigns table, but data about campaigns is part of the `emails` table.

Let’s take a look at the `emails` cube first:

<CodeTabs>

```javascript
cube(`emails`, {
  sql_table: `emails`,

  measures: {
    count: {
      type: `count`
    }
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    campaign_name: {
      sql: `campaign_name`,
      type: `string`
    },

    campaign_id: {
      sql: `campaign_id`,
      type: `number`
    }
  }
})
```

```yaml
cubes:
  - name: emails
    sql_table: emails

    measures:
      - name: count
        type: count

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: campaign_name
        sql: campaign_name
        type: string

      - name: campaign_id
        sql: campaign_id
        type: number
```

</CodeTabs>

We can extract campaigns data into a virtual `campaigns` cube:

<CodeTabs>

```javascript
cube(`campaigns`, {
  sql: `
    SELECT
      campaign_id,
      campaign_name,
      customer_name,
      MIN(created_at) AS started_at
    FROM emails
    GROUP BY 1, 2, 3
  `,

  measures: {
    count: {
      type: `count`
    }
  },

  dimensions: {
    id: {
      sql: `campaign_id`,
      type: `string`,
      primary_key: true
    },

    name: {
      sql: `campaign_name`,
      type: `string`
    }
  }
})
```

```yaml
cubes:
  - name: campaigns
    sql: |
      SELECT
        campaign_id,
        campaign_name,
        customer_name,
        MIN(created_at) AS started_at
      FROM emails GROUP BY 1, 2, 3

    measures:
      - name: count
        type: count

    dimensions:
      - name: id
        sql: campaign_id
        type: string
        primary_key: true

      - name: name
        sql: campaign_name
        type: string
```

</CodeTabs>

The following diagram shows our data model with the `Campaigns` cube:

<Diagram
  alt="Many-to-Many Entity Diagram for emails, campaigns and transactions"
  src="https://ucarecdn.com/9803e369-cb00-4e00-9c61-1a3d3e90873e/"
/>

The last piece is to finally declare a many-to-many relationship. This should be
done by declaring a [`one_to_many`
relationship][ref-schema-ref-joins-relationship] on the associative cube,
`campaigns` in our case.

<CodeTabs>

```javascript
cube(`emails`, {
  sql_table: `emails`,

  joins: {
    campaigns: {
      relationship: `many_to_one`,
      sql: `${CUBE}.campaign_id = ${campaigns.id}
      AND ${CUBE}.customer_name = ${campaigns.customer_name}`
    }
  },

  measures: {
    count: {
      type: `count`
    }
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    campaign_name: {
      sql: `campaign_name`,
      type: `string`
    },

    campaign_id: {
      sql: `campaign_id`,
      type: `number`
    }
  }
})

cube(`campaigns`, {
  joins: {
    transactions: {
      relationship: `one_to_many`,
      sql: `${CUBE}.customer_name = ${transactions.customer_name}
      AND ${CUBE}.campaign_id = ${transactions.campaign_id}`
    }
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    customer_name: {
      sql: `customer_name`,
      type: `string`
    }
  }
})
```

```yaml
cubes:
  - name: emails
    sql_table: emails

    joins:
      - name: campaigns
        relationship: many_to_one
        sql: |
          {CUBE}.campaign_id = {campaigns.id} AND {CUBE}.customer_name =
          {campaigns.customer_name}

    measures:
      - name: count
        type: count

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: campaign_name
        sql: campaign_name
        type: string

      - name: campaign_id
        sql: campaign_id
        type: number

  - name: campaigns

    joins:
      - name: transactions
        relationship: one_to_many
        sql: |
          {CUBE}.customer_name = {transactions.customer_name} AND
          {CUBE}.campaign_id = {transactions.campaign_id}

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: customer_name
        sql: customer_name
        type: string
```

</CodeTabs>

## Join tree

When Cube analyzes a [query][ref-queries], it builds a _join tree_ that connects all
cubes involved in the query in compliance with the [direction of joins](#direction-of-joins).
If the join tree cannot be built, the query will fail to execute. The presence of
[bidirectional joins](#bidirectional-joins) or [diamond subgraphs](#diamond-subgraphs)
can complicate the join tree structure.

### Direction of joins

**All joins in Cube's data model are _directed_.** They flow from the source cube (the one
where the join is defined) to the target cube (the one referenced in the join).

Cube will respect the join graph when generating SQL queries. It means that _source_
cubes will be on the left side of `JOIN` clauses and _target_ cubes will be on the right
side of `JOIN` clauses.  Consider the following data model, consisting of cubes `left`
and `right`:

<CodeTabs>

```yaml
cubes:
  - name: left
    sql: |
      SELECT 1 AS id, 11 AS value UNION ALL
      SELECT 2 AS id, 12 AS value UNION ALL
      SELECT 3 AS id, 13 AS value

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: value
        sql: value
        type: number

    joins:
      - name: right
        sql: "{left.id} = {right.id}"
        relationship: one_to_one

  - name: right
    sql: |
      SELECT 1 AS id, 101 AS value UNION ALL
      SELECT 2 AS id, 102 AS value UNION ALL
      SELECT 3 AS id, 103 AS value

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: value
        sql: value
        type: number

    # joins:
    #   - name: left
    #     sql: "{left.id} = {right.id}"
    #     relationship: one_to_one
```

```javascript
cube(`left`, {
  sql: `
    SELECT 1 AS id, 11 AS value UNION ALL
    SELECT 2 AS id, 12 AS value UNION ALL
    SELECT 3 AS id, 13 AS value
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    value: {
      sql: `value`,
      type: `number`
    }
  },

  joins: {
    right: {
      sql: `${left.id} = ${right.id}`,
      relationship: `one_to_one`
    }
  }
})

cube(`right`, {
  sql: `
    SELECT 1 AS id, 101 AS value UNION ALL
    SELECT 2 AS id, 102 AS value UNION ALL
    SELECT 3 AS id, 103 AS value
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    value: {
      sql: `value`,
      type: `number`
    }
  }

  // joins: {
  //   left: {
  //     sql: `${left.id} = ${right.id}`,
  //     relationship: `one_to_one`
  //   }
  // }
})
```

</CodeTabs>

It defines a join that is directed from `left` to `right`. If you query for `left.value`
and `right.value`, Cube will generate the following SQL query. As you can see, `left` is
on the left side of the `JOIN` clause, and `right` is on the right side:

```sql
SELECT
  "left".value "left__value",
  "right".value "right__value"
FROM (
  SELECT 1 AS id, 11 AS value UNION ALL
  SELECT 2 AS id, 12 AS value UNION ALL
  SELECT 3 AS id, 13 AS value
) AS "left"
LEFT JOIN (
  SELECT 1 AS id, 101 AS value UNION ALL
  SELECT 2 AS id, 102 AS value UNION ALL
  SELECT 3 AS id, 103 AS value
) AS "right" ON "left".id = "right".id
GROUP BY 1, 2
```

If you comment out the join definition in the `left` cube and uncomment the join
definition in the `right` cube, Cube will generate the following SQL for the same query:

```sql
SELECT
  "left".value "left__value",
  "right".value "right__value"
FROM (
  SELECT 1 AS id, 101 AS value UNION ALL
  SELECT 2 AS id, 102 AS value UNION ALL
  SELECT 3 AS id, 103 AS value
) AS "right"
LEFT JOIN (
  SELECT 1 AS id, 11 AS value UNION ALL
  SELECT 2 AS id, 12 AS value UNION ALL
  SELECT 3 AS id, 13 AS value
) AS "left" ON "left".id = "right".id
GROUP BY 1, 2
```

As you can see, the direction of joins greatly influences the generated SQL and,
conseqently, the final result set.

### Bidirectional joins

**As a rule of thumb, it's not recommended to define _bidirectional joins_** in the data
model (i.e., having both cubes define a join to each other) by default. However, it can
still be useful for some valid analytical use cases.

Consider the following data model with `orders` and `customers` for an e-commerce that
has both registered and guest customers (they have `NULL` as `customer_id`):

<CodeTabs>

```javascript
cube(`orders`, {
  sql: `
    SELECT 1 AS id, 1001 AS customer_id, 123 AS revenue UNION ALL
    SELECT 2 AS id, 1001 AS customer_id, 234 AS revenue UNION ALL
    SELECT 3 AS id, 1002 AS customer_id, 345 AS revenue UNION ALL
    SELECT 4 AS id, NULL AS customer_id, 456 AS revenue
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    customer_id: {
      sql: `customer_id`,
      type: `number`
    }
  },

  measures: {
    order_count: {
      type: `count`
    },

    total_revenue: {
      sql: `revenue`,
      type: `sum`
    }
  },

  joins: {
    customers: {
      sql: `${orders.customer_id} = ${customers.id}`,
      relationship: `many_to_one`
    }
  }
})

cube(`customers`, {
  sql: `
    SELECT 1001 AS id, 'Alice' AS name UNION ALL
    SELECT 1002 AS id, 'Bob' AS name UNION ALL
    SELECT 1003 AS id, 'Eve' AS name
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    name: {
      sql: `name`,
      type: `string`
    }
  },

  measures: {
    customer_count: {
      type: `count`
    }
  }

  // joins: {
  //   orders: {
  //     sql: `${orders.customer_id} = ${customers.id}`,
  //     relationship: `one_to_many`
  //   }
  // }
})
```

```yaml
cubes:
  - name: orders
    sql: |
      SELECT 1 AS id, 1001 AS customer_id, 123 AS revenue UNION ALL
      SELECT 2 AS id, 1001 AS customer_id, 234 AS revenue UNION ALL
      SELECT 3 AS id, 1002 AS customer_id, 345 AS revenue UNION ALL
      SELECT 4 AS id, NULL AS customer_id, 456 AS revenue

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: customer_id
        sql: customer_id
        type: number

    measures:
      - name: order_count
        type: count

      - name: total_revenue
        sql: revenue
        type: sum

    joins:
      - name: customers
        sql: "{orders.customer_id} = {customers.id}"
        relationship: many_to_one

  - name: customers
    sql: |
      SELECT 1001 AS id, 'Alice' AS name UNION ALL
      SELECT 1002 AS id, 'Bob' AS name UNION ALL
      SELECT 1003 AS id, 'Eve' AS name

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: name
        sql: name
        type: string

    measures:
      - name: customer_count
        type: count

    # joins:
    #   - name: orders
    #     sql: "{orders.customer_id} = {customers.id}"
    #     relationship: one_to_many
```

</CodeTabs>

Querying `customers.name` and `orders.order_count` will produce the following result:

<Screenshot src="https://lgo0ecceic.ucarecd.net/f65257ee-486a-4069-8b9b-8617b6b83e5f/"/>

As you can see, the result set only includes registered customers; we get no data for
guest checkouts. Check the generated SQL query:

```sql
SELECT
  "customers".name "customers__name",
  count("orders".id) "orders__order_count"
FROM (
  SELECT 1001 AS id, 'Alice' AS name UNION ALL
  SELECT 1002 AS id, 'Bob' AS name UNION ALL
  SELECT 1003 AS id, 'Eve' AS name
) AS "customers"
LEFT JOIN (
  SELECT 1 AS id, 1001 AS customer_id, 123 AS revenue UNION ALL
  SELECT 2 AS id, 1001 AS customer_id, 234 AS revenue UNION ALL
  SELECT 3 AS id, 1002 AS customer_id, 345 AS revenue UNION ALL
  SELECT 4 AS id, NULL AS customer_id, 456 AS revenue
) AS "orders" ON "orders".customer_id = "customers".id
GROUP BY 1
```

The `customers` cube is on the left side of the `JOIN` clause, since the direction of
joins is from `customers` to `orders`. This means that the query will only return
registered customers and all orders by guest customers will be excluded.

Now, if you uncomment the join definition in the `orders` cube and comment out the
join definition in the `customers` cube, running the same query will produce the following
result:

<Screenshot src="https://lgo0ecceic.ucarecd.net/1f5c764e-9f9d-456a-887a-42fec52319ac/"/>

As you can see, now the result set includes guest checkouts, but we have no data for
registered customers who have not placed any orders (namely, `Eve`). Check the
generated SQL query, which reveals why:

```sql
SELECT
  "customers".name "customers__name",
  count("orders".id) "orders__order_count"
FROM (
  SELECT 1 AS id, 1001 AS customer_id, 123 AS revenue UNION ALL
  SELECT 2 AS id, 1001 AS customer_id, 234 AS revenue UNION ALL
  SELECT 3 AS id, 1002 AS customer_id, 345 AS revenue UNION ALL
  SELECT 4 AS id, NULL AS customer_id, 456 AS revenue
) AS "orders"
LEFT JOIN (
  SELECT 1001 AS id, 'Alice' AS name UNION ALL
  SELECT 1002 AS id, 'Bob' AS name UNION ALL
  SELECT 1003 AS id, 'Eve' AS name
) AS "customers" ON "orders".customer_id = "customers".id
GROUP BY 1
```

**Bidirectional joins often lead to the ambiguity in the data model** and can produce
ambiguous results, as Cube may not know which direction to follow when generating SQL
queries. You can remove the ambiguity by using [join paths](#join-paths) and [join
hints](#join-hints).

### Diamond subgraphs

A _diamond subgraph_ is a specific type of join structure where there's more than one
join path between two cubes, e.g., `users.schools.countries` and
`users.employers.countries`. Join structures like `a.b.c` + `a.c` or `a.b.c.d` + `a.b.d`
are also be considered diamond subgraphs for the purpose of this section.

In the following example, four cubes are joined together as a _diamond_: `a` joins to `b`
and `c`, and both `b` and `c` join to `d`:

<CodeTabs>

```yaml
cubes:
  - name: a
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: d_via_b
        sql: "{b.d.id}"
        type: number

      - name: d_via_c
        sql: "{c.d.id}"
        type: number

    joins:
      - name: b
        sql: "{a.id} = {b.id}"
        relationship: one_to_one

      - name: c
        sql: "{a.id} = {c.id}"
        relationship: one_to_one

  - name: b
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

    joins:
      - name: d
        sql: "{b.id} = {d.id}"
        relationship: one_to_one

  - name: c
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

    joins:
      - name: d
        sql: "{c.id} = {d.id}"
        relationship: one_to_one

  - name: d
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id UNION ALL
      SELECT 3 AS id

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
```

```javascript
cube(`a`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    d_via_b: {
      sql: `${b.d.id}`,
      type: `number`
    },

    d_via_c: {
      sql: `${c.d.id}`,
      type: `number`
    }
  },

  joins: {
    b: {
      sql: `${a.id} = ${b.id}`,
      relationship: `one_to_one`
    },

    c: {
      sql: `${a.id} = ${c.id}`,
      relationship: `one_to_one`
    }
  }
})

cube(`b`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    }
  },

  joins: {
    d: {
      sql: `${b.id} = ${d.id}`,
      relationship: `one_to_one`
    }
  }
})

cube(`c`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    }
  },

  joins: {
    d: {
      sql: `${c.id} = ${d.id}`,
      relationship: `one_to_one`
    }
  }
})

cube(`d`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id UNION ALL
    SELECT 3 AS id
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    }
  }
})
```

</CodeTabs>

When querying `a.d_via_b`, Cube will generate the following SQL query, joining through
`b`:

```sql
SELECT
  "d".id "a__d_via_b"
FROM (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "a"
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "b" ON "a".id = "b".id
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "d" ON "b".id = "d".id
GROUP BY 1
```

However, when querying `a.d_via_c`, Cube will generate the following SQL query, joining
through `c`:

```sql
SELECT
  "d".id "a__d_via_c"
FROM (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "a"
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "c" ON "a".id = "c".id
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "d" ON "c".id = "d".id
GROUP BY 1
```

<WarningBox>

Note that having both `a.d_via_b` and `a.d_via_c` in the same query will produce
ambiguous results, as Cube will only use `d` to the generated SQL once:

```sql
SELECT
  "d".id "a__d_via_b",
  "d".id "a__d_via_c"
FROM (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "a"
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "b" ON "a".id = "b".id
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "d" ON "b".id = "d".id
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "c" ON "a".id = "c".id
GROUP BY 1, 2
```

</WarningBox>

Similar to bidirectional joins, **diamond subgraphs often lead to the ambiguity in the
data model** and can produce ambiguous results, as Cube may not know which direction to
follow when generating SQL queries. You can remove the ambiguity by using [join
paths](#join-paths) and [join hints](#join-hints).

## Join paths

_Join paths_ serve as the recommended mechanism to remove the ambiguity of joins in the
data model, including cases of [bidirectional joins](#bidirectional-joins) and [diamond
subgraphs](#diamond-subgraphs).

A join path is defined as a sequence of cube names, separated by dots, that will be
followed in that specific order when resolving a cube or its member, e.g.,
`users.locations.countries.flag_aspect_ratio`.

### Calculated members

When referencing a member of another cube in a [calculated member][ref-calculated-members],
you can use a join path to specify the path to follow, as demonstrated in the [diamond
subgraphs example](#diamond-subgraphs).

### Views

When referencing cubes in a [view][ref-views], you literally provide join paths via the
[`join_path` parameter][ref-view-join-path]. The [bidirectional joins
example](#bidirectional-joins) can be disambiguated with the following views:

<CodeTabs>

```javascript
view(`total_revenue_per_customer`, {
  cubes: [
    {
      join_path: orders,
      includes: ['total_revenue']
    },
    {
      join_path: orders.customers,
      includes: ['name']
    }
  ]
})

view(`customers_without_orders`, {
  cubes: [
    {
      join_path: customers,
      includes: ['name']
    },
    {
      join_path: customers.orders,
      includes: ['order_count']
    }
  ]
})
```

```yaml
views:
  - name: total_revenue_per_customer
    cubes:
      - join_path: orders
        includes:
          - total_revenue

      - join_path: orders.customers
        includes:
          - name

  - name: customers_without_orders
    cubes:
      - join_path: customers
        includes:
          - name

      - join_path: customers.orders
        includes:
          - order_count
```

</CodeTabs>

### Pre-aggregations

When referencing members of another cubes in a [pre-aggregation][ref-preaggs], you can
also use join paths, as shown in the following example:

<CodeTabs>

```yaml
cubes:
  - name: a
    sql: |
      SELECT 1 AS id, 1 AS b_id, 1 AS c_id UNION ALL
      SELECT 2 AS id, 2 AS b_id, 2 AS c_id

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: dimension1
        sql: "CONCAT({CUBE}.b_id, {CUBE}.b_id)"
        type: string

    measures:
      - name: measure1
        type: count

    joins:
      - name: b
        relationship: one_to_one
        sql: "{CUBE}.b_id = {b.id}"

      - name: c
        relationship: one_to_one
        sql: "{CUBE}.c_id = {c.id}"

    pre_aggregations:
      - name: a_and_c
        dimensions:
          - a.dimension1
          - a.b.c.dimension2
        measures:
          - a.measure1
          - a.b.c.measure2

  - name: b
    sql: |
      SELECT 1 AS id, 1 AS c_id UNION ALL
      SELECT 2 AS id, 2 AS c_id

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

    joins:
      - name: c
        relationship: one_to_one
        sql: "{CUBE}.c_id = {c.id}"

  - name: c
    sql: |
      SELECT 1 AS id UNION ALL
      SELECT 2 AS id

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: dimension2
        sql: "{CUBE}.id * 3"
        type: string

    measures:
      - name: measure2
        sql: "{CUBE.dimension2}"
        type: sum
```

```javascript
cube(`a`, {
  sql: `
    SELECT 1 AS id, 1 AS b_id, 1 AS c_id UNION ALL
    SELECT 2 AS id, 2 AS b_id, 2 AS c_id
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    dimension1: {
      sql: `CONCAT(${CUBE}.b_id, ${CUBE}.b_id)`,
      type: `string`
    }
  },

  measures: {
    measure1: {
      type: `count`
    }
  },

  joins: {
    b: {
      relationship: `one_to_one`,
      sql: `${CUBE}.b_id = ${b.id}`
    },

    c: {
      relationship: `one_to_one`,
      sql: `${CUBE}.c_id = ${c.id}`
    }
  },

  pre_aggregations: {
    a_and_c: {
      dimensions: [
        `a.dimension1`,
        `a.b.c.dimension2`
      ],
      measures: [
        `a.measure1`,
        `a.b.c.measure2`
      ]
    }
  }
})

cube(`b`, {
  sql: `
    SELECT 1 AS id, 1 AS c_id UNION ALL
    SELECT 2 AS id, 2 AS c_id
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    }
  },

  joins: {
    c: {
      relationship: `one_to_one`,
      sql: `${CUBE}.c_id = ${c.id}`
    }
  }
})

cube(`c`, {
  sql: `
    SELECT 1 AS id UNION ALL
    SELECT 2 AS id
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true
    },

    dimension2: {
      sql: `${CUBE}.id * 3`,
      type: `string`
    }
  },

  measures: {
    measure2: {
      sql: `${CUBE.dimension2}`,
      type: `sum`
    }
  }
})
```

</CodeTabs>

When this pre-aggregation is built, Cube will generate and execute the following SQL
query:

```sql
CREATE TABLE prod_pre_aggregations.a_a_and_c AS
SELECT
  "c".id * 3 "c__dimension2",
  CONCAT("a".b_id, "a".b_id) "a__dimension1",
  sum("c".id * 3) "c__measure2",
  count("a".id) "a__measure1"
FROM (
  SELECT 1 AS id, 1 AS b_id, 1 AS c_id UNION ALL
  SELECT 2 AS id, 2 AS b_id, 2 AS c_id
) AS "a"
LEFT JOIN (
  SELECT 1 AS id, 1 AS c_id UNION ALL
  SELECT 2 AS id, 2 AS c_id
) AS "b" ON "a".b_id = "b".id
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id
) AS "c" ON "b".c_id = "c".id
GROUP BY 1, 2
```

As you can see, in this query, `a` is joined to `c` though `b` here, even though a
direct join from `a` to `c` is also defined in the data model. However, Cube respects
join paths from the pre-aggregation definition and uses them to generate the SQL query.

## Join hints

_Join hints_ serve as an auxiliary mechanism to remove the ambiguity of joins _at the
query time_. When possible, use [join paths](#join-paths) in the data model instead.

### SQL API

In queries to the [SQL API][ref-sql-api], cubes can be [joined][ref-sql-api-joins] via
the `CROSS JOIN` clause or via `__cubeJoinField`. In any case, Cube will analyze the
query and follow provided join hints.

Let's run the following query with the data model from the [diamond subgraphs
example](#diamond-subgraphs):

```sql
SELECT
  a.id AS id,
  d.id AS d_via_b
FROM a
CROSS JOIN b
CROSS JOIN d
GROUP BY 1, 2
```

The following SQL query will be generated:

```sql
SELECT
  "a".id "a__id",
  "d".id "d__id"
FROM (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "a"
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "b" ON "a".id = "b".id
LEFT JOIN (
  SELECT 1 AS id UNION ALL
  SELECT 2 AS id UNION ALL
  SELECT 3 AS id
) AS "d" ON "b".id = "d".id
GROUP BY 1, 2
```

If the SQL API query contained `CROSS JOIN c` instead of `CROSS JOIN b`, then the
generated SQL query would contain a join through `c` instead of `b`.

### REST API

In queries to the [REST API][ref-rest-api], join hints can be provided via the
[`joinHints` parameter][ref-rest-api-join-hints].

The SQL API query from the previous section can be rewritten as a REST API query
with join hints as follows:

```json
{
  "dimensions": [
    "a.id",
    "d.id"
  ],
  "joinHints": [
    ["a", "b"],
    ["b", "d"]
  ]
}
```

The same SQL query as in the previous section will be generated.

Similarly, if the `joinHints` parameter contained `[["a", "c"], ["c", "d"]]` instead of
`[["a", "b"], ["b", "d"]]`, then the generated SQL query would contain a join through
`c` instead of `b`.

## Troubleshooting

### `Can't find join path`

Sometimes, you might come across the following error message: `Can't find join path to
join 'cube_a', 'cube_b'`.

It indicates that a query failed because it includes members from cubes that can't be
joined in order to generate a valid query to the upstream data source.
Please check that you've defined necessary joins and that they have [correct
directions](#transitive-join-pitfalls).

Also, please consider using [views][ref-schema-ref-view] since they
incapsulate join paths and completely remove the possibility of the error in question.
You might also consider setting the [`public` parameter][ref-cube-public] to `false`
on your cubes to hide them from end users.

If you’re building a custom data application, you might use the [`meta` endpoint][ref-rest-meta]
of the REST API. It groups cubes into `connectedComponents` to help select those ones
that can be joined together.

### `Primary key is required when join is defined`

Sometimes, you might come across the following error message: `cube_a cube: primary key
for 'cube_a' is required when join is defined in order to make aggregates work properly`.

It indicates that you have a [cube][ref-cube] with joins and [pre-aggregations][ref-preaggs].
However, that cube doesn't have a primary key.

When generating SQL queries, Cube uses primary keys to avoid fanouts. A fanout happens
when two tables are joined and a single value gets duplicated in the end result, meaning
that some values can be double counted.

Please define a [primary key][ref-primary-key] dimension in this cube to make joins and
pre-aggregations work correctly.

If your data doesn't have a natural primary key, e.g., `id`, you can define a composite
primary key by concatenating most or all of the columns in the table. Example:

```yml
cubes:
  - name: cube_a
    # ...

    dimensions:
      - name: composite_key
        sql: CONCAT(column_a, '-', column_b, '-', column_c)
        type: string
        primary_key: true
```

### Transitive join pitfalls

Let's consider an example where we have a many-to-many relationship between
`users` and `companies` through the `companies_to_users` cube:

<CodeTabs>

```javascript
cube(`users`, {
  sql: `
    SELECT 1 AS id, 'Ali' AS name UNION ALL
    SELECT 2 AS id, 'Bob' AS name UNION ALL
    SELECT 3 AS id, 'Eve' AS name
  `,

  measures: {
    count: {
      type: `count`
    }
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `string`,
      primary_key: true
    }
  }
})

cube(`companies`, {
  sql: `
    SELECT 11 AS id, 'Acme Corporation' AS name UNION ALL
    SELECT 12 AS id, 'Stark Industries' AS name
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `string`,
      primary_key: true
    },

    name: {
      sql: `name`,
      type: `string`
    }
  }
})

cube(`companies_to_users`, {
  sql: `
    SELECT 11 AS company_id, 1 AS user_id UNION ALL
    SELECT 11 AS company_id, 2 AS user_id UNION ALL
    SELECT 12 AS company_id, 3 AS user_id
  `,

  joins: {
    users: {
      sql: `${CUBE}.user_id = ${users.id}`,
      relationship: `one_to_many`
    },

    companies: {
      sql: `${CUBE}.company_id = ${companies.id}`,
      relationship: `one_to_many`
    }
  },

  dimensions: {
    id: {
      // Joins require a primary key, so we'll create one on-the-fly
      sql: `CONCAT(${CUBE}.user_id, ':', ${CUBE}.company_id)`,
      type: `string`,
      primary_key: true
    }
  }
})
```

```yaml
cubes:
  - name: users
    sql: |
      SELECT 1 AS id, 'Ali' AS name UNION ALL
      SELECT 2 AS id, 'Bob' AS name UNION ALL
      SELECT 3 AS id, 'Eve' AS name

    measures:
      - name: count
        type: count

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true

  - name: companies
    sql: |
      SELECT 11 AS id, 'Acme Corporation' AS name UNION ALL
      SELECT 12 AS id, 'Stark Industries' AS name

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true

      - name: name
        sql: name
        type: string

  - name: companies_to_users
    sql: |
      SELECT 11 AS company_id, 1 AS user_id UNION ALL
      SELECT 11 AS company_id, 2 AS user_id UNION ALL
      SELECT 12 AS company_id, 3 AS user_id

    joins:
      - name: users
        sql: "{CUBE}.user_id = {users.id}"
        relationship: one_to_many

      - name: companies
        sql: "{CUBE}.company_id = {companies.id}"
        relationship: one_to_many

    dimensions:
      - name: id
        # Joins require a primary key, so we'll create one on-the-fly
        sql: "CONCAT({CUBE}.user_id, ':', {CUBE}.company_id)"
        type: string
        primary_key: true
```

</CodeTabs>

With this data model, querying `users.count` as a measure and `companies.name`
as a dimension would yield the following error: `Can't find join path to join
'users', 'companies'`.

The root cause is that joins are [directed](#direction-of-joins) and there's no
join path that goes *by join definitions in the data model* from `users` to
`companies` or in the opposite direction.

In [Visual Model][ref-visual-model], you can see that both `users` and `companies`
are to the right of `companies_to_users`, meaning that there's no way to go
from `users` to `companies` moving left to right or right to left:

<Diagram src="https://ucarecdn.com/c7374e2c-7359-4c13-adee-74cb32623dc6/" />

One possible solution is to move one of two joins from the `companies_to_users`
cube to either `users` or `companies` cube. Please note that it would affect
the query semantics and thus the final result:

<CodeTabs>

```javascript
cube(`users`, {
  joins: {
    companies_to_users: {
      sql: `${CUBE}.id = ${companies_to_users}.user_id`,
      relationship: `one_to_many`
    }
  }

  // ...
})

cube(`companies_to_users`, {
  joins: {
    // users: {
    //   sql: `${CUBE}.user_id = ${users.id}`,
    //   relationship: `one_to_many`
    // },

    companies: {
      sql: `${CUBE}.company_id = ${companies.id}`,
      relationship: `one_to_many`
    }
  }

  // ...
})
```

```yaml
cubes:
  - name: users

    joins:
      - name: companies_to_users
        sql: "{CUBE}.id = {companies_to_users}.user_id"
        relationship: one_to_many

    # ...

  - name: companies_to_users

    joins:
      # - name: users
      #   sql: "{CUBE}.user_id = {users.id}"
      #   relationship: one_to_many

      - name: companies
        sql: "{CUBE}.company_id = {companies.id}"
        relationship: one_to_many

    # ...
```

</CodeTabs>

Now there's a join path that goes *by join definitions in the data model* from
`users` to `companies_to_users` to `companies`. [Visual Model][ref-visual-model]
also shows that you can reach `companies` from `cubes` by going left to right.

<Diagram src="https://ucarecdn.com/0736a570-c1c2-45b5-a806-e90a216ce202/" />


[ref-schema-ref-view]: /product/data-modeling/reference/view
[ref-schema-ref-joins-relationship]: /product/data-modeling/reference/joins#relationship
[ref-visual-model]: /product/workspace/visual-model
[ref-cube]: /product/data-modeling/reference/cube
[ref-cube-public]: /product/data-modeling/reference/cube#public
[ref-rest-meta]: /product/apis-integrations/rest-api/reference#base_pathv1meta
[ref-preaggs]: /product/data-modeling/concepts#pre-aggregations
[ref-primary-key]: /product/data-modeling/reference/dimensions#primary_key
[ref-data-model]: /product/data-modeling/concepts
[ref-queries]: /product/apis-integrations/queries
[ref-references]: /product/data-modeling/syntax#cubecolumn-cubemember
[ref-calculated-members]: /product/data-modeling/concepts/calculated-members
[ref-views]: /product/data-modeling/concepts#views
[ref-view-join-path]: /product/data-modeling/reference/view#join_path
[ref-preaggs]: /product/data-modeling/concepts#pre-aggregations
[ref-rest-api]: /product/apis-integrations/rest-api
[ref-sql-api]: /product/apis-integrations/sql-api
[ref-sql-api-joins]: /product/apis-integrations/sql-api/joins
[ref-rest-api-join-hints]: /product/apis-integrations/rest-api/query-format#query-properties